6726-6741 Nucleic Acids Research, 2014, Vol 42, No. 10 
doi: 10.1093lnarlgku269 



Published online 17 April 2014 



DNA-protein 7T-interactions in nature: abundance, 
structure, composition and strength of contacts 
between aromatic amino acids and DNA nucleobases 
or deoxyribose sugar 

Katie A. Wilson, Jennifer L. Kellie and Stacey D. Wetmore' 

Department of Chemistry and Biochemistry, University of Lethbridge, 4401 University Drive West, Lethbridge, AB, 
T1K3I\/I4, Canada 

Received February 25, 2014; Revised IVIarch 19, 2014; Accepted IVIarch 21, 2014 



ABSTRACT 

Four hundred twenty-eight high-resolution DNA- 
protein complexes were chosen for a bioinformatics 
study. Although 164 crystal structures (38% of those 
searched) contained no interactions, 574 discrete tt- 
contacts between the aromatic amino acids and the 
DNA nucleobases or deoxyribose were identified us- 
ing strict criteria, including visual inspection. The 
abundance and structure of the interactions were de- 
termined by unequivocally classifying the contacts 
as either tt-tt stacking, tt-tt T-shaped or sugar- 
7T contacts. Three hundred forty-four nucleobase- 
amino acid tt-tt contacts (60% of all interactions 
identified) were identified in 175 of the crystal struc- 
tures searched. Unprecedented in the literature, 230 
DNA-protein sugar-ir contacts (40% of all interac- 
tions identified) were identified in 137 crystal struc- 
tures, which involve C-H tt and/or lone-pair tt in- 
teractions, contain any amino acid and can be clas- 
sified according to sugar atoms involved. Both tt-tt 
and sugar-7T interactions display a range of relative 
monomer orientations and therefore interaction en- 
ergies (up to -50 (-70) kJ mol"^ for neutral (charged) 
interactions as determined using quantum chemical 
calculations). In general, DNA-protein 7T-interactions 
are more prevalent than perhaps currently accepted 
and the role of such interactions in many biological 
processes may yet to be uncovered. 

INTRODUCTION 

DNA-protein interactions are essential to life. Indeed, the 
genetic information contained in the sequence of DNA nu- 
cleobases (A, C, T and G) must be processed by enzymes, 
which transcribe the nucleobase code into RNA and sub- 



sequently generate new proteins. Alternatively, proteins can 
bind to DNA in order to replicate the nucleobase sequence 
as cells grow and divide. DNA-protein interactions are also 
evident in other critical cellular processes, such as the re- 
pair of DNA damage caused by carcinogenic compounds 
or UV light (1-4). Contacts between DNA and proteins are 
typically noncovalent, which allows the resulting complex 
to perform necessary biological functions, yet readily de- 
grade such that both biomolecules can provide additional 
function to the cell (5,6). The noncovalent contacts between 
DNA and proteins have traditionally been categorized as 
(direct or water-mediated) hydrogen bonding, ionic (salt 
bridges or DNA backbone interactions) and other forces, 
including van der Waals and hydrophobic interactions (7- 
9). Understanding each class of DNA-protein contacts will 
provide a greater appreciation of critical cell functions and 
open the door for the development of new medicinal and 
biological applications, including rational drug design (10- 
12) and the control of gene expression (13-16). 

To gain an understanding of the interactions between 
DNA and proteins, previous work has searched crystal 
structures pubhshed in the protein data bank (PDB) and 
determined the relative frequency of different types of con- 
tacts. Early studies in this area were limited by the lack 
of high-resolution crystal structures of DNA-protein com- 
plexes (17-20). While this problem has been overcome in 
the past decade (7,21-23), more recent works disagree about 
the relative frequency of different types of contacts. Indeed, 
characterization of 129 DNA-protein complexes suggests 
that van der Waals interactions are more common than (di- 
rect or water-mediated) hydrogen bonding (7). In contrast, 
a survey of 139 DNA-protein complexes suggests that hy- 
drogen bonding is more frequent than van der Waals, hy- 
drophobic or electrostatic interactions (22). Such discrep- 
ancies may arise since, unlike hydrogen bonding, there are 
relatively undefined guidelines for the structure of van der 
Waals interactions, and therefore there is likely substantial 
variation among the interactions included in this category. 
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Regardless, both studies determined that van der Waals in- 
teractions compose more than 30% of DNA-protein con- 
tacts (7,22). 

In addition to traditional classifications of DNA-protein 
interactions, careful examination of the list of contacts 
identified in previous works suggests that many interac- 
tions occur between the DNA nucleobases and the aro- 
matic amino acids (Supplementary Figure SI) (7,22). In 
general, interactions between aromatic rings are known to 
be widespread throughout chemistry and biology (24,25). 
Indeed, the prevalence and potential importance of interac- 
tions between aromatic side chains in proteins (26-31), as 
well as at protein-protein interfaces (32), have been docu- 
mented through PDB searches. Furthermore, investigation 
of 89 RNA-protein complexes suggests that RNA-protein 
van der Waals interactions are more prevalent than hydro- 
gen bonding, with the most favoured nucleotide-amino acid 
pairs including the aromatic amino acids (specifically, the 
U:Tyr, A:Phe and G:Trp pairs) (33), while a search of 61 
structures revealed an abundance of interactions between 
Trp and the purines (8). Collectively, these studies suggest 
that closer investigations of DNA-protein tt-tt interactions 
are warranted. 

Among the first studies to specifically consider DNA- 
protein TT-TT contacts, Mao et al investigated the molecular 
recognition of adenosine 5 '-triphosphate (ATP) by differ- 
ent proteins, and determined that tt-tt interactions between 
A and the aromatic amino acids are essential for substrate 
binding, with a 2.7: 1 .0 DNA-protein hydrogen bondingiir- 
7T contact ratio (34). Subsequently, Baker and Grant identi- 
fied a large number of tt-tt interactions between the DNA 
nucleobases and Tyr, Phe, His or Trp in 141 DNA-protein 
complexes (8). Unfortunately, the overall trends in the rel- 
ative abundances of A-amino acid pairs are significantly 
different in these two studies. This discrepancy may arise 
due to differences in the structures searched, but is more 
Hkely an artefact of the (distance only) search criteria imple- 
mented. Indeed, ring proximity alone does not guarantee a 
suitable relative orientation of two residues, and therefore 
not all previously characterized interactions correspond to 
TT-TT (stacking or T-shaped) contacts (Supplementary Fig- 
ure S2). Thus, the true frequency and structure of these in- 
teresting aromatic interactions between DNA and proteins 
remain unclear. Nevertheless, the proximity of the nucle- 
obases and aromatic amino acids suggests that aromatic- 
aromatic (tt-tt or C/N-H-'-tt) interactions may help stabi- 
lize DNA-protein complexes or may be involved in nucleic 
acid recognition. 

Recent works corroborate that modern computational 
techniques can provide important information about ir- 
7T interactions (see, for example, references 35-39). In 
terms of DNA-protein contacts, quantum chemical calcu- 
lations have been used to clarify the strength of tt-tt con- 
tacts between the nucleobases and aromatic amino acids 
found in experimental crystal structures (8,34,40^2). To 
complement this data, the preferred (lowest energy) rel- 
ative monomer orientations have been identified for iso- 
lated dimers by systematically changing the relative ori- 
entations of monomers of fixed geometry (41,43-46) or 
fully relaxed systems (40-42). Both tt-tt stacking (face-to- 
face) (41,43-46) and tt-tt T-shaped (edge-to-face) (41,43- 




Figure 1. Examples of (A) nucleobase-amino acid tt-tt T-shaped interac- 
tion (PDB ID: 2WQ7), (B) nucleobase-amino acid tt-tt stacking interac- 
tion (PDB ID: 3MR5) and (C) deoxyribose-amino acid sugar-7T interac- 
tion (PDB ID: 3BKZ). 



46) contacts have been considered in these studies (Figure 
lA and B). Our group has completed the most extensive 
investigations, where over 1000 relative monomer orienta- 
tions were considered for each nucleobase-aromatic amino 
acid pair to determine the preferred relative monomer ori- 
entation (46^9). Our highly accurate calculations suggest 
that the strengths of these tt-tt stacking and T-shaped inter- 
actions are up to approximately -43 kJ mol~^ (46,50), which 
were calculated as the energy difference between the dimer 
and individual monomers. This suggests that tt-tt contacts 
can contribute to DNA-protein binding and/or stabilize 
DNA-protein complexes to the same extent as hydrogen 
bonding. Furthermore, our group investigated the enhance- 
ment in the binding energy due to charge by considering 
dimers involving cationic His (49) or a damaged (cationic 
alkylated) nucleobase (47,51,52), as well as the effects of 
water molecules on the stability of charged dimers (53). Al- 
though most of these studies were performed on model sys- 
tems that only include aromatic rings, the extension of the 
computational model to include the biological backbone 
(54-56) or additional tt-tt contacts (57) has been deter- 
mined to minimally affect the strength of individual con- 
tacts. Together, these works provide important details about 



6728 Nucleic Acids Research, 2014, Vol 42, No. 10 



the preferred structure and magnitude of DNA-protein tt- 
TT interactions, and their potential biological roles. 

In addition to interactions with the DNA nucleobases, 
analysis of crystal structures reveals a significant number 
of short distances between the aromatic amino acids and 
the DNA backbone (7,22). Although many of these likely 
correspond to ionic contacts or hydrogen bonding with the 
phosphate moiety, a significant number of interactions were 
deemed to specifically involve the deoxyribose sugar. In- 
deed, all aromatic amino acids were found to participate 
in these interactions in nature. Despite short distances be- 
tween the sugar and the aromatic amino acids, the nature of 
these contacts has yet to be exphcitly discussed in the liter- 
ature. 

In contrast to it -interactions involving the DNA sugar 
moiety, contacts between various carbohydrates and the 
aromatic amino acids have been identified in crystal struc- 
tures (58-61), and the importance of these contacts has 
been accepted in many fields, including glycobiology (see, 
for example, (62)-(68) and reference therein) and nanotech- 
nology (see, for example, (69)-(74) and references therein). 
The significant strength of carbohydrate-ir contacts in crys- 
tal structures has been verified using computational meth- 
ods (58-61). Other modeling studies have characterized 
the binding strengths of dimers between different carbohy- 
drates and aromatic amino acids modeled as benzene (Phe) 
(73,75-79), toluene (Phe) (80-82), phenol (Tyr) (83) and/or 
indole (His) (80,83), or with the protein backbone included 
(84,85). Complexes involving naphthalene have also been 
considered in an effort to better understand the proper- 
ties of carbohydrate C-H---'tt interactions (86). These works 
have collectively determined that the amino acid can inter- 
act with either side (face) of the carbohydrate. The strengths 
of the carbohydrate-TT interactions are dependent on the 
carbohydrate, the amino acid and relative monomer ori- 
entation, and are up to approximately -50 kJ mol~^, with 
the most stable structures containing both carbohydrate-ir 
contacts and hydrogen bonding (with an exocyclic hydroxyl 
group). Interestingly, carbohydrate-ir interactions involv- 
ing a DNA nucleobase have also been characterized (87- 
90). 

By analogy to the importance of carbohydrate-ir inter- 
actions to glycobiology, it is reasonable to propose that 
TT-contacts between the DNA deoxyribose moiety and the 
aromatic amino acids in proteins may provide stability 
and/or function in DNA-protein complexes. Furthermore, 
previous work on carbohydrate-Ti interactions suggests 
that deoxyribose contacts could involve C-H-'-tt and/or 
hydrogen-bonding interactions (via the hydroxyl groups) 
with the amino acid ir-system. From a fundamental per- 
spective, the ring size is notably different between deoxyri- 
bose and the most widely studied carbohydrates (pyra- 
noses), which could substantially affect the structure and 
energetics of the ir-interactions. Although interactions pre- 
dominantly involve one of the two carbohydrate faces, con- 
tacts may also occur with the sides of deoxyribose due to 
the relative positions of the ring hydrogen atoms. 

In the current study, over 400 high-resolution DNA- 
protein complexes available in the PDB were searched to 
definitively determine the frequency and characterize the 
nature (structure, composition and strength) of contacts be- 



tween the aromatic amino acids (including cationic His) and 
the DNA nucleobases (tt-tt contacts. Figure lA and B) or 
the deoxyribose moiety (sugar-ir contacts. Figure IC). Un- 
precedented in the DNA-protein interaction literature, all 
nucleobase-aromatic amino acid dimers identified were vi- 
sually inspected to unequivocally verify each contact rep- 
resents a TT-TT interaction, and to classify the contact as 
either a nucleobase-amino acid stacking or T-shaped in- 
teraction (Figure lA and B), which could involve either a 
nucleobase edge interacting with an amino acid TT-system 
(face) or an amino acid edge interacting with the nucleobase 
face. Although experimental data can be used to identify 
contacts in nature, no information is obtained about the 
strength of these interactions. Therefore, accurate quantum 
chemical methods were used to evaluate the binding energy 
of each dimer system found in the crystal structures. Our 
study thereby clarifies previous literature by providing the 
most complete information to date on DNA-protein tt-tt 
interactions in nature. Using the same thorough approach, 
deoxyribose-aromatic amino acid sugar-ir interactions in 
experimental crystal structures have been quantified for the 
first time, and determined to be based on many different 
types of noncovalent interactions that are known in struc- 
tural chemistry, including C-H---'tt (Figure IC) and lone- 
pair- ••tt contacts. As a result, a novel classification system 
is developed based on the nature of the edge of the sugar. 
Combining data on the natural occurrence and strength of 
these two broad classes of DNA-protein interactions pro- 
vides important information that will help unveil their po- 
tential roles in many biological systems. 

MATERIALS AND METHODS 

Datasets 

DNA-protein complexes were identified in the PDB using 
similar criteria to those previously used in the literature to 
detect nucleobase-amino acid tt-tt contacts (Supplemen- 
tary Figure S3) (8,30). Specifically, X-ray crystal structures 
pubHshed before 24 May 201 1 with a resolution better than 
2.0 A and less than 90% sequence identity were chosen for 
analysis (428 crystal structures total). 



Selecting systems for analysis 

Pymol (91) was used to select all aromatic amino acids and 
nucleobase or deoxyribose moieties separated by less than 
5.0 A in each crystal structure. This choice of distance is 
supported by computational studies that determined the 
optimal vertical separation in DNA-protein nucleobase- 
aromatic amino acid dimers is typically 3.5 A (45,46). As 
outhned in the Introduction, the quahfying DNA-protein 
dimers were then visually inspected to indisputably ver- 
ify the contact is a tt -interaction and classify the contact 
as either a nucleobase-amino acid stacking, nucleobase- 
amino acid T-shaped (nucleobase or amino acid edge) or 
deoxyribose sugar-TT interaction. The PDB IDs for the crys- 
tal structures searched in the present work, as well as the 
type(s) of interactions identified and the nucleobase/ sugar- 
amino acid residues involved, are provided in the SI. 
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Geometries used for quantum mechanical calculations 

For the nucleobase-amino acid tt-tt interactions, the inter- 
planar angle between the two rings, denoted as tilt (oo. Fig- 
ure 1), was measured using Mercury (92), and used to fur- 
ther classify the tt-tt interaction as stacked (w = 0-20°), T- 
shaped (oo = 70-90°) or inclined (20° < w < 70°). Mercury 
was also used to measure the closest heavy atom distance 
between monomers. The dimer binding strengths were de- 
termined using truncated models obtained by replacing the 
DNA or protein backbone with a hydrogen atom (Supple- 
mentary Figure SI). Previous research has shown that ne- 
glect of the DNA or protein backbone does not significantly 
affect the magnitude of the tt-tt contact (52,54,55). For His 
interactions, both a cationic (His^) and two neutral (His^ 
and His^; Supplementary Figure SI) models were consid- 
ered due to the unique pKa of this amino acid, and there- 
fore varied protonation states adopted in biological systems 
(93). Additionally, the hydroxyl group of Tyr was orientated 
in two directions, denoted as clockwise (CW) and counter- 
clockwise (CCW) according to the direction of the hydroxyl 
moiety when the dimer is oriented with Tyr below the nu- 
cleobase (see Supplementary Figure SI). The planar (Cs 
symmetric) monomers were aligned by overlaying MP2/6- 
3 1 G(d) optimized geometries onto the crystal structure ori- 
entation according to root-mean-square (RMS) fitting of 
the ring heavy atoms using HyperChem 8.0.8 (94). 

For all identified sugar-ir interactions, the amino acid 
was initially overlaid (using RMS fitting) onto the crystal 
structure geometry as discussed for the nucleobase-amino 
acid interactions (94). However, due to variations in the 
sugar pucker throughout the crystal structures, and the an- 
ticipated effect of sugar puckering on the binding energy, a 
fully optimized isolated sugar could not be overlaid onto the 
crystal structure. Instead, the sugar moiety was first trun- 
cated by replacing the nucleobase, as well as the 5' and 3' 
phosphorus atoms, with hydrogen atoms (Supplementary 
Figure SI). Subsequently, all protons in the sugar-amino 
acid dimer were then optimized at the MP2/6-31G(d) level 
of theory, while fixing the heavy atoms. The Z(C4'-C5'-05'- 
H) and Z(C4'-C3'-03'-H) dihedral angles in the sugar (Sup- 
plementary Figure SI) were also frozen to the crystal struc- 
ture geometry during the optimizations, in order to con- 
strain the orientation of the hydrogen atoms at the O5' and 
O3' truncation points. This approach for sugar-ir contacts 
is justified by studies revealing that neither structures nor 
binding strengths of carbohydrate-ir interactions deviate 
significantly (< 2 kJ/mol) when crystal structures or fully 
optimized geometries are considered (58). 

Interaction energies 

Quantum chemical calculations were used to determine the 
strength of the intermolecular forces acting between the nu- 
cleobase and amino acid (tt-tt interactions) and the inter- 
molecular forces acting between the sugar and amino acid 
(sugar-TT interactions) based on the dimer geometries dis- 
cussed in the previous section. Specifically, the interaction 
or binding energy (AE) was calculated according to Equa- 
tion (1). 

/^E = E"^^^' - E^^ - E'^y (1) 



In this equation, ^^^"^^^ stands for the electronic energy 
of the TT-TT stacking, T-shaped or sugar-ir dimer, while E^^ 
and E^^ stand for the electronic energies of the isolated sub- 
systems (aromatic amino acid (aa) and nucleobase or de- 
oxyribose subunit of the nucleotide (nt), respectively). The 
geometry of each monomer in the dimer is the same as 
the structure of the isolated monomer. The calculated in- 
teraction energy does not include zero-point vibrational or 
Gibbs energy correction. Furthermore, the binding ener- 
gies were calculated in the gas phase and are therefore rel- 
evant to DNA-protein binding environments of low polar- 
ity (95). We acknowledge that polar environments will likely 
decrease the magnitude of the reported interaction energies, 
as well as diminish the impact of His protonation. Never- 
theless, previous work has shown that tt-tt and iTcation-TT in- 
teractions are of significant strength in more polar environ- 
ments (41,49,51). Future work should consider the effects of 
solvation and thereby extend our conclusions to all DNA- 
protein binding environments including the rarer high po- 
larity active sites. 

To identify a quantum chemical method that best bal- 
ances accuracy and computational cost due to the large 
number of contacts identified, the binding strength of select 
dimers that span the range of interactions found in the PDB 
search was calculated with several levels of theory (Supple- 
mentary Table SI). The M06-2X density functional theory 
(DFT) functional was chosen (with both 6-31+G(d,p) and 
aug-cc-pVTZ basis sets) based on literature testing the abil- 
ity of this functional to accurately describe carbohydrate-ir 
contacts (96), as well as DNA-protein nucleobase-amino 
acid TT-contacts (48,50). However, other DFT function- 
als were also considered that were originally developed 
to account for dispersion interactions and have proven to 
work well for noncovalent contacts (97,98), namely B3LYP- 
D3, B97-D3 and a)B97-D (with aug-cc-pVTZ basis sets). 
The DFT results were validated using the highly accu- 
rate CCSD(T) calculations at the complete basis set (CBS) 
limit. To obtain CCSD(T)/CBS estimates, MP2/CBS en- 
ergies were determined using the aug-cc-pVDZ and aug- 
cc-pVTZ basis sets with Helgaker's extrapolation scheme 
(99, 100), and the differences in the (counterpoise-corrected) 
MP2 and CCSD(T) energies were calculated with aug-cc- 
pVDZ and added to the MP2/CBS values. We note that 
these energies are denoted as CCSD(T)/CBS for consis- 
tency with our previous work on other DNA-protein inter- 
actions (46,48,50) despite some Hterature referring to these 
extrapolated values as CBS(T) (44,101-106). Furthermore, 
only slight changes in the interaction energies of nucleobase 
pairs have been reported upon considering a higher-level 
triple to quadruple-zeta extrapolation (107,108). 

Upon changing the M06-2X basis set from 6-31+G(d,p) 
to aug-cc-pVTZ, the MUD (mean unsigned deviation) for 
the sugar-TT interactions decreases (Supplementary Table 
SI). However, due to significant errors in the nucleobase- 
aromatic amino acid tt-tt interactions, the overall MUD 
increases with respect to the CCSD(T)/CBS estimate from 
1.5 to 2.4 upon basis set expansion along with a substantial 
increase in computational time. Indeed, M06-2X has been 
shown to accurately describe other DNA-protein noncova- 
lent interactions with a moderately sized basis set (48,50). 
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In contrast, a)B97x-D/aug-cc-pVTZ describes both broad 
classes of contacts as accurately as M06-2X/6-31+G(d,p), 
leading to the same overall MUD at an increased computa- 
tional cost. Among the functional tested, B3LYP-D3/aug- 
cc-pVTZ performs the best, but again this is coupled with 
significantly increased computational cost compared to the 
efficient M06-2X/6-31+G(d,p) combination. Most impor- 
tantly, the trends in the interaction energies and the large 
magnitude of the nucleobase and sugar-aromatic amino 
acid TT -interactions predicted by M06-2X/6-31+G(d,p) are 
preserved upon consideration of the CCSD(T)/CBS esti- 
mates. Thus, M06-2X/6-31+G(d,p) was confidently used 
in the present study to compare the strength of many differ- 
ent types of DNA-protein tt-tt interactions. 

Software 

All M06-2X, MP2 and CCSD(T) calculations were per- 
formed with program defaults using Gaussian 09 (revisions 
A.02 and C.Ol) (109), while all DFT-D and DFT-D3 cal- 
culations were performed using Q-Chem 4.0.1.0 (110). 

RESULTS 

Crystal structure analysis of nucleobase-aromatic amino 
acid contacts in nature 

Overall distribution of contacts in DNA-protein complexes. 
Among the 428 crystal structures considered in the present 
work, 175 (41%) contain at least one nucleobase-amino 
acid stacking or T-shaped interaction, with 344 total 
nucleobase-amino acid stacking or T-shaped interactions 
identified. Most of the 175 crystal structures contain one or 
two interactions, but as many as 13 contacts can be found 
in a single structure (Figure 2A). These interactions occur 
in a wide variety of proteins, including DNA-binding and 
transcription proteins, with approximately 38% of the tt-tt 
contacts being identified in transferase proteins and 25% in 
hydrolase proteins (Figure 2B). 

Occurrence of nucleobases and aromatic amino acids in con- 
tacts. Pyrimidines are involved in more tt-tt interactions 
than purines (Figure 3A), where the population trend with 
respect to the nucleobase decreases according to T > C > 
A ~ G. Specifically, 37% of the contacts involve T, with the 
remaining being relatively equally distributed among the 
other bases (~20%). When the distribution is considered as 
a function of the amino acid (Figure 3B), significantly more 
interactions are found with Phe (44%) and Tyr (32%) than 
either His (1 1%) or Trp (13%). Nevertheless, Trp is the least 
common amino acid (~1% abundance), which may explain 
the fewer contacts identified with this residue. On the other 
hand, Tyr, Phe and His have similar natural abundances (3- 
4%) and therefore our results suggest that His is less likely 
to form 7T-7T stacking or T-shaped interactions with a DNA 
nucleobase. When all nucleobase-amino acid combinations 
are considered (Figure 3C), Phe, Tyr and Trp contacts de- 
crease in abundance with respect to the nucleobase as T > 
C > A ~ G, while His forms the most contacts with C (the 
second most frequently observed interaction with respect to 
the nucleobase) and does not form any contacts with G. 



60% 
50% 
40% 

i 30% 

I 
Ik 




0 12 345 67 89 10 11 12 13 
Number of Interactions 

(B) (C) 




■ HYDROIASE/DNA bTRANSFERASEAJNA 
bTRANSCRI PTION/DNA u DNA BINDING PROTEIN/DNA 

■ OXIDOREDUCTASE/DNA bTRANSPORT PROTEIN/DNA 

■ Lyase/DNA u Other 



Figure 2. (A) Number of nucleobase-amino acid stacking/T-shaped in- 
teractions identified in PDB structures in the present study. (B) Types of 
proteins in which nucleobase-amino acid stacking/T-shaped interactions 
were found. (C) Overall composition of the proteins in the crystal struc- 
tures considered in the present work. 
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Figure 3. The proportions of (A) nucleobases, (B) amino acids and (C) 
nucleobase-amino acid combinations in DNA-protein tt-tt stacked and 
T-shaped orientations found in nature. 

Relative abundance of face-to-face and face-to-edge tt-tt 
binding arrangements 

The nucleobase-amino acid tt-tt contacts adopt conforma- 
tions ranging from stacked (w = 0-20°) to T-shaped (w = 
70-90°) orientations (Figure 4). However, the stacked ori- 
entation is substantially more common (58%) than the T- 
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Figure 4. Frequency of tilt angle (degrees) between the ring planes for all 
interactions according to the (A) nucleobase or (B) amino acid. 



shaped configuration (13%). The T-shaped interactions are 
also less frequent than the inclined structures (o) = 20-70°, 
29%, Figure 4), but this is due to the large number of an- 
gles in the inclined category, while the frequency for a given 
angle in the T-shaped and inclined categories are nearly 
equal (approximately <5%). Within the tt-tt stacking in- 
teractions, the dimers more commonly adopt a tilt of 5-10° 
rather than a perfectly parallel orientation (o) = 0). Con- 
versely, the perfectly perpendicular arrangement (oo = 90°) 
is the preferred T-shaped configuration. The most common 
inclined structures (o) = 20-70°) involve either a oo = 25-30° 
or a maximum tilt of 45-50° (Figure 4). 

Dependence of jt-jt binding arrangement on the nucleobase. 
A correlation exists between the nucleobase in the dimer 
and the tilt angle adopted (Figure 4A). Specifically, al- 
though all nucleobases prefer a stacked orientation, the 
largest frequency occurs with o) = 5-10° for T, C and A, 
but with o) = 10-15° for G. Among the inclined orienta- 
tions, C and G prefer only slight deviations from stacking 
(o) = 25-35°), T prefers the maximum degree of tilt (o) = 
45-50°) and A rarely adopts an inclined orientation (< 5% 
frequency for o) = 30-70°). Cytosine is the most likely nu- 
cleobase to adopt a T-shaped structure (15% frequency for 
o) = 85-90°). Although A and T also adopt T-shaped ori- 
entations with > 10% frequency, G rarely forms a T-shaped 
dimer (< 5% frequency). Interestingly, A is only found in 
a T-shaped orientation with Phe. Furthermore, 74% of the 
identified T-shaped interactions and 21% of the inclined in- 
teractions involve a nucleobase edge and an amino acid face. 

Dependence of jt-jt binding arrangement on the amino acid. 
As discussed for the nucleobases, all amino acids show a 



preference for the w = 5-10° stacked orientation, except 
His which equally prefers a 0-5° tilt (Figure 4B). In fact. 
His and Trp are rarely found in any orientation besides a 
stacked structure (5 and 8% frequency for w = 20-90°, re- 
spectively). Although Tyr adopts almost the full range of 
tilt angles, a stacked or slightly tilted orientation is most fre- 
quent adopted. Unlike the other amino acids, Phe exhibits 
a substantial occupancy of both inclined (oo = 45-50°) and 
T-shaped (o) = 85-90°) orientations (32 and 20%, respec- 
tively). 

Trends in the distances between monomers. In addition to 
the varied tilt angles adopted by the nucleobase-amino acid 
dimers, many different separation distances are observed 
(Supplementary Figure S4A). Overall, the closest heavy 
atom distances fall between 3.0 and 4.2 A in the nucleobase- 
amino acid TT-TT dimers, with nearly a quarter of all inter- 
actions adopting a 3.5 A separation. Interestingly, there is 
no clear correlation between the separation distance and 
tilt angle (Supplementary Figure S4B). Furthermore, un- 
Hke the stacking angle, which preferentially adopts a differ- 
ent value for each nucleobase, all bases have the same trend 
in the preferred separation distance (Supplementary Figure 
S4C). Conversely, the amino acids do not follow a particu- 
lar trend in the separation distance. Specifically, Tyr adopts 
a large range of distances and His general adopts shorter 
distances (< 5% occupancy of distances greater than 3.7 A; 
Supplementary Figure S4D), while Phe and Trp display the 
same overall trend as across all tt-tt contacts. 

Quantum chemical calculations of nucleobase-aromatic 
amino acid interaction energies 

The discussion above shows that nucleobase-amino acid 
dimers adopt a wide range of tt-tt structures and there- 
fore it is not surprising that the dimers also span a sig- 
nificant range of binding strengths (Figure 5). The mag- 
nitude of the nucleobase-amino acid stacking or T-shaped 
TT-TT interaction depends on several factors such as the rel- 
ative monomer orientation (including tilt angle), and the 
identity of the nucleobase and amino acid. For all DNA- 
protein pairs, the largest (most negative) binding energy oc- 
curs when the amino acid and nucleobase adopt a stacked 
(o) = 0-20°), not T-shaped (oo = 70-90°), orientation. With 
the exception of the fact that the maximum interaction ener- 
gies generally occur for T and G, the most dominant trends 
depend on the amino acid. Therefore, interesting features of 
the binding energies will be discussed below as a function of 
the amino acid. 

Phenylalanine. Phe interactions are up to -26.3 kJ mol~^ . 
In the stacked orientation, G or T generally leads to 
stronger contacts than A or C, while G or C interactions 
are generally stronger than T or A T-shaped interactions 
(Figure 5A). This leads to, for example, an 18.8 kJ mol~^ 
energy difference between the strongest T:Phe stacking and 
T-shaped dimers (Figure 5A). 

Tryptophan. Similarly, the Trp interactions are up to -3 1 . 3 
kJ mol~^ with the strongest stacking interactions occur- 
ring with T or G (Figure 5B). However, no general conclu- 
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sions about the strength of Trp T-shaped interactions can 
be drawn since only one such contact was identified (Figure 
5B). 

Tyrosine. Unhke Trp and Phe, Tyr can adopt muhiple con- 
formations when stacked with the nucleobases, which dif- 
fer in the orientation of the hydroxyl moiety (Supplemen- 
tary Figure SI). However, the hydroxyl orientation has a 
negligible effect on the binding energy, with less than a 5 
kJ mol~^ energy difference between the two conformations 
for 74% of the interactions considered (Figure 5C). As dis- 
cussed for Phe and Trp, Tyr interactions are stronger in the 
stacked rather than T-shaped orientation, with the largest 
deviation (up to 28.7 kJ mol~^) occurring for T dimers (Fig- 
ure 5C). The overall strongest Tyr interaction occurs with 
C (-31.6 kJ mol~^ Figure 5C). Tyr nucleobase interactions 
are similar in strength to the corresponding Phe contact. 
Furthermore, although Tyr, Phe and Trp bind strongest to 
the pyrimidines, there is only a 5 kJ mol~^ difference in the 
corresponding strongest interaction energies for these three 
amino acids. 

Histidine. Similar to Tyr, (neutral) His can adopt two ori- 
entations (protonation states) with respect to the nucle- 
obase (Supplementary Figure SI). However, unlike Tyr in- 
teractions. His contacts are highly dependent on the amino 
acid orientation, with 60% of the structures considered dis- 
playing a greater than 10 kJ mol~^ energy difference with 
a change in His orientation and the largest difference (18 
kJ mol~^) occurring in a C dimer (Figure 5D). The great- 
est number of contacts and strongest interactions (-27.1 
kJ mol~^) with (neutral) His occur when stacked with C, 
which contrasts the greatest number and strongest interac- 
tions found with T for all other amino acids. As previously 
mentioned, very few His contacts were found to adopt a T- 
shaped orientation in nature (Figure 5D), where the only T- 
shaped interaction is -5.0 kJ mol~^ and occurs with A. In- 
teractions with cationic His are up to -48.7 kJ mol~^ which 
is 21 .6 kJ mol~^ stronger than the neutral dimer. As for neu- 
tral His, the strongest interaction for cationic His occurs 
when stacked with C. Interestingly, although the interaction 
strengths between His and A, G or C always increase, and 
the interaction strengths with T decrease upon protonation. 
The different behaviour of T:His dimers upon protonation 
has been previously noted in the literature (49) and is at- 
tributed to the more positive ir-system of T compared to 
the other nucleobases. 

Crystal structure analysis of deoxyribose sugar-aromatic 
amino acid contacts in nature 

Overall distribution of sugar-jt contacts in DNA-protein 
complexes. Among the 428 crystal structures searched in 
the present study, 230 sugar-ir contacts were identified 
in 137 structures. Although crystal structures containing 
sugar-7T contacts typically have only one such interaction, 
up to six sugar-TT contacts can be observed in a single struc- 
ture (Figure 6A). The sugar-ir contacts occur in a wide va- 
riety of DNA-binding proteins (Figure 6B). Interestingly, 
68% of the structures do not contain a sugar-ir interaction 
(Figure 6A), which is more than the 59% that do not con- 
tain a nucleobase-amino acid contact (Figure 2A), while 
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Figure 8. (A) Numbering scheme of the sugar moiety. Representative 
sugar-7T interactions identified in crystal structures for (B) single proton, 
(C) face, (D) bridged, (E) lone pair and (F) lone pair-proton interactions 
(the amino acid is represented by a solid black line below the sugar). 



Number of sugar-n/anranoacid-nudeobase contacts 

Figure 6. (A) The number of sugar-ir contacts found in each structure. 
(B) Types of proteins in which sugar-TT interactions were found. (C) The 
number of sugar-ir and nucleobase-amino acid interactions observed in 
crystal structures considered in the present work. 



38% of the structures do not contain any nucleobase it-it or 
sugar-TT interactions (Figure 6C). Nevertheless, both types 
of amino acid interactions can be found in 1 1% of the struc- 
tures, with these DNA-protein complexes typically possess- 
ing one of each type, but can contain up to six of one and 
two of the other class (Figure 6C). 

Occurrence of aromatic amino acids in sugar-n contacts. 
Sugar-TT interactions occur with all four aromatic amino 
acids (Figure 7A). However, most sugar-Ti contacts involve 
Tyr (45%), which is closely followed by Phe (36%). In con- 
trast, few sugar-TT interactions are found with His (4%) de- 
spite a similar natural abundance as Phe and Tyr (3-4%). 
Trp interactions make up 14% of all sugar-ir interactions, 
which is consistent with the relative natural abundance of 
Trp (1%) in comparison to Tyr and Phe. 

Classification of sugar-n contacts in DNA-protein com- 
plexes. A variety of contacts occur between the ir-systems 
(faces) of the aromatic amino acids and deoxyribose in na- 
ture, which can be classified according to the sugar "edge" 
(Figure 8). The sugar edge that interacts with the ir-system 



can involve a single proton, two protons (a bridge), three 
protons (a face), a lone pair, or both a lone pair and a proton 
(lone pair-proton). Furthermore, these contacts can involve 
any of the hydrogen atoms in the sugar ring. The bridged 
and face interactions are the most common in the struc- 
tures searched, with overall abundances of 33 and 30%, re- 
spectively (Figure 7B). While lone pair-proton interactions 
are fairly uncommon (4%), distinction between lone pair- 
proton and lone pair interactions is difficult, which collec- 
tively account for 1 7% of the contacts and is similar to the 
proportion of single proton interactions (20%, Figure 7B). 
Example orientations of the four most common interac- 
tions from select crystal structures are provided in Figure 
9, which further clarifies the geometry of these contacts in 
nature. 



Relative monomer orientations in sugar-n contacts. Figure 
9 displays overlays of all contacts identified for each of the 
four most common sugar-TT contacts, which were obtained 
using RMS fitting of the sugar atoms involved in the inter- 
action. From these representative examples, it can be seen 
that the sugar-ir interactions display significant variation 
in the amino acid position, which covers nearly all rela- 
tive monomer orientations for a given sugar-edge type and 
leads to a continuum between the edges. Variations in the 
sugar are also evident from the overlays, which mainly arise 
due to different puckering in the crystal structures. 
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Dependence of binding arrangement on the sugar atoms 
involved. Within each category of sugar-ir interactions, 
there is a clear preference for contacts with certain atoms 
(Figure 10). For example, single proton interactions occur 
with more than twice as frequently as any other pro- 
ton. Similarly, the Hia-H2b bridged contact occurs more 
than three times as often as any other contact in this cate- 
gory and the H4-H5a-H5b contact dominates the face class, 
which is in fact the overall most frequent sugar-Ti inter- 
action (25% frequency). All lone pair interactions identi- 
fied involve O4' (rather than O5/ or O3' phosphate backbone 
atoms) and more frequently do not involve a proton. When 
O4' lone pair-proton interactions occur, contacts involving 
H4 are twice as likely as those involving Hia. 

Dependence of binding arrangement on the amino acid. 
Within a given type of interaction, certain amino acids are 
more prevalent (Figure 10). Specifically, the single proton 
interactions are most common with Tyr. On the other hand, 
lone pair and bridged interactions involving each of the four 
aromatic amino acids can be identified, with Tyr or Phe in- 
volved in the majority of the contacts. Conversely, Trp and 



Tyr compose approximately two-third of all face interac- 
tions. When the trend is instead considered as a function of 
amino acid and interaction adopted (Supplementary Fig- 
ure S5), substantial variation in the types of contacts identi- 
fied for each amino acid is noted. Trp only forms four types 
of sugar-TT interactions in the crystal structures searched, 
which is fewer than for any other amino acid and does not 
include a single proton contact. The H4-H5a-H5b face in- 
teraction makes up 76% of all sugar-Trp interactions, while 
the other three Trp interactions include two O4' interactions 
and the H2a-H5b bridged interaction. Unlike Trp, His forms 
seven different sugar-ir interactions that span all four cat- 
egories of sugar-TT contacts, with the O4' interaction being 
the most common (30%) and the Hsb interaction also preva- 
lent (20%, Supplementary Figure S5). In addition to being 
significantly more common, interactions with Phe and Tyr 
are markedly more varied, with more than 8 and 15 types 
of contacts found, respectively (Supplementary Figure S5). 
The most prevalent sugar-ir Phe interaction is the Hia-H2b 
bridged interaction (43%), where Phe bridged interactions 
are in general considerably more common (59%) than face, 
lone pair and single proton contacts (19%, 16% and 13%, 
respectively). Unlike the other amino acids, Tyr does not 
substantially prefer one specific interaction. However, Tyr 
has some similarities to the other amino acids, where three 
of the four most common Tyr interactions include H4-H5a- 
Hsb (most common for Trp), O4' (most common for His) 
and Hia-H2b (most common for Phe). 

Quantum chemical calculations of deoxyribose sugar- 
aromatic amino acid interaction energies 

The previous section shows that sugar-ir interactions with 
the aromatic amino acids can adopt many different orienta- 
tions in DNA-protein complexes. This structural variation 
leads to binding strengths for (neutral) sugar-ir interactions 
between approximately 0 and -30 kJ mol~^ (Figure 1180). 
Interactions with Trp are particularly strong, with magni- 
tudes of up to -29.3 kJ mol~^ and generally more stable 
than -20 kJ mol~^ . Interactions with Tyr can also be strong 
(up to -31.6 kJ mol~^), but cover the full range of bind- 
ing energies (i.e. from 0 to -30 kJ mol~^). In general, the 
Tyr interactions do not greatly depend on the orientation 
of the hydroxyl moiety, with 86% of all sugar-Tyr interac- 
tions displaying a less than 5 kJ mol~^ difference between 
the two orientations, but the dependence can be up to 22.1 
kJ mol~^ when a hydrogen bond forms in addition to the 
sugar-TT interaction. Conversely, although Phe and (neu- 
tral) His contacts are generally weaker, they exhibit a sig- 
nificant range (from 0 to -20 kJ mol~^ Figure 11). Simi- 
lar to Tyr, the His binding strength depends on the amino 
acid orientation by 0.1-20 kJ mol~^. The overall strongest 
sugar-TT contacts typically occur when His is cationic (espe- 
cially when interacting with O4O, with binding strengths up 
to -68.2 kJ mol-^. 

Dependence on sugar edge. Among all sugar edge- 
aromatic amino acid combinations, only interactions with 
H2a, H2b, 04'-H4, H2a-H3, Hia-H2b and H4-H5a-H5b have 
(neutral) interaction energies stronger than -20 kJ mol~^ 
and only occur with Trp and Tyr. The strongest interac- 
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tions with Trp, Tyr, Phe and (neutral) His occur for H4-H5a- 
Hsb (-29.3 kJmol-i), Hsa (-31.6 kJmol-^), Hia-H2b-H4 (- 
16.2 kJ mol~^) and Hia-H2b (-18.9 kJ mol~^), respectively. 
The overall four strongest interactions are the H4-H5a-H5b 
dimer (-29.3 kJ mol-^), followed by the Hia-H2b (-24.1 
kJ mol-i), O4' (-22.3 kJ mol-^) and Hsa (-18.4 kJ mol-^) 
contacts (Figure 11). Furthermore, the binding strength of 
these four structures can vary by up to approximately 25 kJ 
mol~^ due to differences in the relative orientation of the 
amino acid residue (Figure 11). 

DISCUSSION 

Abundance of nucleobase-aromatic amino acid tt-tt interac- 
tions 

In the 428 crystal structures containing DNA-protein tt- 
interactions (see Supplementary Data), 344 nucleobase- 
aromatic amino acid it-it contacts were identified and, for 
the first time in the literature, unambiguously confirmed 
through visual inspection. These contacts were found in all 
types of proteins (Figure 2B). However, the protein distri- 
bution directly correlates with the protein composition of 
the DNA complexes investigated (Figure 2C), which sug- 
gests that the observed distribution is a consequence of the 
structures searched rather than one protein class being more 
Hkely to rely on nucleobase-amino acid 77-77 interactions. 

Structure of nucleobase-aromatic amino acid 77-77 interac- 
tions 

Among the nucleobase interactions identified, stacked ori- 
entations (with a 5-10° angle (tilt) between ring planes) are 
more prevalent than T-shaped arrangements in a 3:2 ra- 
tio (Figure 4). Nevertheless, structures ranging from per- 
fectly parallel to perfectly perpendicular relative monomer 
orientations appear in nature. Interestingly, the typical 
closest heavy atom-heavy atom distance between the two 
monomers (3.5 A; Supplementary Figure S4) matches the 
preferred distance previously identified in computational 
studies of isolated monomers (45,46), and therefore some 
features of the relative monomer orientations in crystal 
structures may arise due to the inherent nature of the in- 
teractions. 



Composition of nucleobase-aromatic amino acid 77-77 inter- 
actions 

The pyrimidines are more likely to be involved in 77-77 in- 
teractions with aromatic amino acids than the purines (Fig- 
ure 3 A), which contrasts expectations that a larger ring size 
may lead to more 77 -interactions in nature due to greater 
possible overlap. In terms of the amino acids, more inter- 
actions occur with Phe and Tyr than with Trp and His in 
nature (Figure 3B), which does not directly relate to the rel- 
ative natural abundances of these amino acids. This finding 
also contrasts previous literature that reports His to be the 
most likely aromatic amino acid to be involved in DNA- 
protein 77-77 interactions (8). Furthermore, our observation 
that Phe, Tyr and Trp contacts decrease in abundance with 
respect to the nucleobase as T > C > A ~ G. His was found 
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to form the most contacts with C. No contacts between His 
and G were identified (Figure 3C). These findings contrast 
previous reports that His selectively binds to T and G, while 
Phe selectively binds to T and A (7,8). Discrepancies be- 
tween the present study and previous work may arise due 
to the careful visual inspection implemented herein as addi- 
tional verification prior to classifying the tt-tt interactions. 

Strength of nucleobase-aromatic amino acid tt-tt interac- 
tions 

Since there is a large variation in the geometry of 
nucleobase-amino acid tt-tt interactions in nature (Figure 
4), it is not surprising that there is also significant variation 
in the calculated binding strengths (Figure 5), as reported 
previously in computational studies of isolated dimers (40- 
47) or select crystal structure geometries (8,34,40^2). The 
magnitude of the nucleobase-amino acid tt-tt interactions 
are up to approximately -30 kJ mol~^ and vary with the 
monomers involved and their relative orientation (with 
stacked structures being more stable than T-shaped). How- 
ever, the trends in the binding strengths are not always 
the same as those found by considering two monomers in 
the absence of geometrical constraints imposed by an en- 
zyme (45^7,49-57). Interestingly, most interactions iden- 
tified in nature are on average 4.9 kJ mol~^ weaker than 
the corresponding optimal interaction previously reported 
between two monomers in the absence of geometrical con- 
straints imposed by the enzyme (Supplementary Table S2) 
(45^7,49-57). This difference arises due to deviations in 
the geometries (Supplementary Table S2), including greater 
separation distances and tilt in the crystal structures, which 
likely arise due to constraints imposed by the protein versus 
the perfectly parallel (stacked) or perpendicular (T-shaped) 
monomer arrangements implemented in the potential en- 
ergy surface searches. The perfectly stacked or T-shaped 
orientations, as well as the step size implemented, in pre- 
vious calculations also explain why three of the interaction 
energies calculated in the natural orientations are slightly 
stronger than the "optimal" values identified by searching 
the potential energy surface. These features underscore the 
influence of the relative monomer orientations on the bind- 
ing strengths. In agreement with previous studies of charged 
DNA-protein interactions (41,49,50,53) and reports that 
TT-TT and TTcation-TT interactions are distinct (111), cationic 
His has significantly stronger interactions than the neutral 
amino acids, with interaction energies up to approximately 
-50 kJ mol-^ 

Biological relevance of nucleobase-aromatic amino acid tt-tt 
interactions 

Nucleobase-aromatic amino acid tt-tt interactions have 
been implicated in the discriminatory and catalytic removal 
of damaged bases from the human genetic code by the 
DNA repair enzyme alkyladenine DNA glycosylase (A AG) 
(4,112). Specifically, unlike other DNA repair enzymes in 
the same glycosylase family, the active site of AAG is lined 
with three aromatic amino acids and there is limited hy- 
drogen bonding to the substrate (Figure 12A). Although 
the resolution of the associated crystal structure (PDB ID: 




Figure 12. (A) The damaged nucleobase-amino acid tt-tt interactions in 
the AAG active site (PDB ID: lEWN), (B) the natural nucleobase-amino 
acid 7T-7T in the active site (PDB ID: 1G38) and (C) the sugar-TT interaction 
in the Dpo4 active site (PDB ID: 3QZ8). 



lEWN) is lower than the criteria used to select PDB struc- 
tures in this study, and the interactions occur with a dam- 
aged nucleobase, the strengths of contacts between AAG 
and the bound substrate, ethenoadenine (eA), were evalu- 
ated using the same methodology employed in the present 
work. Specifically, the interactions were determined to be - 
24.4 kJ mol~^ for the eA:Tyrl27 stacking interaction, -6.9 
kJ mol~^ for the eA:Hisl36 tilted (inclined) contact and - 
1 .0 kJ mol~^ for the e A:Tyr 1 59 T-shaped (amino acid-edge) 
interaction. In particular, the strength of the eA:Tyrl27 
contact suggests that such active site tt-tt interactions could 
be involved in substrate identification and/or binding. 

The broader implications of the DNA-protein tt-tt con- 
tacts in the AAG active site were determined by a com- 
putational study of the associated catalytic mechanism us- 
ing a full DNA- AAG model and different substrates (112). 
Specifically, the individual effects of sequentially removing 
each AAG active site amino acid suggest that the ir-rings 
are catalytic (by approximately 30 k J mol~^) for the removal 
of neutral damaged nucleobases, but anti-catalytic for the 
removal of charged (cationic) alkylated nucleobases (by up 
to 35 kJ mol~^). Coupled with previous work studying 
the strength of isolated dimers between a natural/ damaged 
DNA base and an aromatic amino acid (47,51,52,57), a pro- 
posal was developed that AAG has evolved to take advan- 
tage of active site amino acid ir-systems in several ways. 
First, the flexibility provided by the active composition 
(lack of discriminatory hydrogen bonding) explains why 
AAG can excise many different substrates. Second, the tt-tt 
interactions with the substrate maximize the catalytic power 
towards neutral lesions that are inherently difficult to ex- 
cise. Finally, although the abihty to remove neutral DNA le- 
sions comes at the expense of the excision of cationic lesions, 
the inherent nature of iTcation-TT interactions (47,51,52,57) 
allows AAG to more strongly attract and bind cationic le- 
sions. 

Although AAG provides an exemplary example of the 
multiple roles tt-tt contacts can play in biology, interac- 
tions between damaged nucleobases and an aromatic amino 
acid residue may also be involved in the catalytic mecha- 
nism of other enzymes. Repair enzymes such as hUNG2 
(113,114) and hOggl (115,116) are known to have tt-tt in- 
teractions in their active sites (involving Phe or Tyr), which 
may contribute towards the catalytic function of these en- 
zymes. Notably, although AAG, hUNG2 and hOggl all 
involve damaged DNA nucleobase active site tt-tt inter- 
actions, 77-77 interactions are also known to contribute to 
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the binding and catalytic function of proteins that pro- 
cess natural DNA. For example, the extrahelical target A 
of N^-adenine DNA methyltransferase (PDB ID: 1G38; 
Figure 12B) forms an active site stacking interaction with 
TyrlOS (-21.6 kJ mol~^) and a T-shaped interaction with 
Phel96 (-7.7 kJ mol~^). Furthermore, as discussed for the 
DNA repair enzymes, the tt-tt interactions in the active 
site of N-DNA methyltransferases (including N^ -adenine 
DNA methyltransferase) have been proposed to contribute 
to catalysis (117). 

Abundance of deoxyribose-aromatic amino acid sugar-ir in- 
teractions 

Among the 428 crystal structures searched in the present 
work, 230 sugar-ir contacts between the deoxyribose moi- 
ety and an aromatic amino acid were identified. Although a 
considerable number of nucleobase tt-tt interactions were 
expected based on previous literature (7,8,21-23,34), this is 
the first time that the significance of sugar-ir contacts has 
been highlighted. Indeed, sugar-ir contacts represent ap- 
proximately 40% of all DNA-protein ir-contacts found in 
the present work, and therefore occur with nearly the same 
frequency as nucleobase-amino acid tt-tt interactions. As 
discussed for the nucleobase-aromatic amino acid interac- 
tions, the sugar-7T contacts are found in a variety of differ- 
ent proteins, with the relative abundances equal to the types 
of proteins searched (Figures 2C and 6B). 

Structure of deoxyribose-aromatic amino acid sugar-ir in- 
teractions 

Although only tt -interactions between the entire sugar face 
of pyranose and the aromatic amino acid were considered 
in previous work (61,62,67,76), a range of sugar-TT contacts 
were identified for deoxyribose in the present study, which 
can involve a single proton, two protons (a bridge), three 
protons (a face), a lone pair, or both a lone pair and a pro- 
ton (lone pair-proton; Figures 7B, 8 and 10). As a result, we 
introduce a classification system for DNA-protein sugar- 
7T interactions based on the sugar edge participating in the 
contact, which can yield C-H-"'tt and/or lone-pair- ••tt in- 
teractions. In the literature, pyranoses involved in stacking 
interactions simultaneously participated in hydrogen bond- 
ing via a hydroxyl group and/or other van der Waals con- 
tact(s) (82-84). Although this preference was not explicitly 
examined in the present work, such hydrogen-bonding con- 
tacts are likely less important in the case of deoxyribose due 
to the lack of hydroxyl substituents on the sugar in DNA 
helices (except at the terminal positions). Interestingly, for 
each class of sugar-Ti interactions, the amino acid adopts 
a continuum of positions with respect to the sugar moiety 
(Figure 9). 

Composition of deoxyribose-aromatic amino acid sugar-ir 
interactions 

Across the deoxyribose contacts identified in nature, each 
hydrogen atom in the sugar ring is involved in an interac- 
tion with the TT-system of an aromatic amino acid (Figure 



10). Nevertheless, certain atoms are more prone to partici- 
pate in particular types of contacts (Hsa dominates the sin- 
gle proton, Hia-H2b the bridged and H4-H5a-H5b the face 
interactions). Furthermore, although the bridged and face 
interactions are the most common overall relative monomer 
arrangements (Figure 10), interactions with the ring oxygen 
(rather than the O3' or O5' phosphate atoms) are also preva- 
lent and are sometimes accompanied by a C-H---'tt contact. 

The abundance of interactions with respect to the amino 
acid involved (Figure 7A) is similar to that discussed for the 
amino acid-nucleobase contacts (Figure 3B), with most in- 
teractions involving Tyr and Phe. The preferred binding ar- 
rangement is different for each amino acid, which likely oc- 
curs due to differences in the relative size of the ir-systems. 
Specifically, Trp displays a preference for face interactions, 
Phe prefers bridged contacts, and His adopts the most lone 
pair-TT contacts (Figure 10). Although Tyr assumes a wide 
variety of conformations with respect to the sugar moiety, 
most single proton interactions occur with Tyr (Figure 10). 

Strength of deoxyribose-aromatic amino acid sugar-7T inter- 
actions 

The variation in the sugar-ir conformations leads to a sig- 
nificant range in the binding energies (Figure 11), which are 
as strong as, or even stronger than, nucleobase-amino acid 
interactions (Figure 5). Indeed, the magnitude of sugar- 
TT contacts found in nature can be up to approximately - 
70 kJ mol~^ Among the neutral dimers, the sugar inter- 
actions with Trp are the strongest (most negative), which 
is consistent with the highly stable nucleobase-Trp inter- 
actions found in the present work and reported previ- 
ously (45,46,50), as well as carbohydrate-Trp contacts (83). 
Nevertheless, the strongest interactions overall occur with 
cationic His, as discussed for the nucleobase ir-contacts, 
which typically represent lone pair binding arrangements. 

Interestingly, although the strongest interactions occur 
when a pyranose C-H is directed at the center of the aro- 
matic face (76), the amino acid displays a wide range of lo- 
cations with respect to the sugar in DNA sugar-TT contacts. 
This imphes that the sugar composition plays a large role in 
determining the preferred geometry of the interaction. To 
gain further fundamental information about sugar-ir con- 
tacts, calculations as previously conducted for nucleobase- 
amino acid pairs (45,46,49) that consider the preferred rel- 
ative orientation of isolated dimers in the absence of an en- 
zyme, as well as the associated inherent interaction energy, 
should be considered for sugars of varying composition. 

Biologically relevance of deoxyribose-aromatic amino acid 
sugar-7T interactions 

Despite the fact that DNA sugar-7T contacts with aromatic 
amino acid residues are rarely discussed in the literature, 
the importance of analogous carbohydrate-ir interactions 
in many fields (62-74) coupled with the number of contacts 
found in nature in the present study suggests that these in- 
teractions may also be important for biological processes, 
either by providing stability to DNA-protein complexes, fa- 
cilitating DNA binding/recognition, or possibly even hav- 
ing a greater (catalytic) role. As an example, the DNA poly- 
merases in the RT, Y, X and B-families that are involved 
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in crucial cell replication have a conserved Tyr/Phe in their 
active sites. It has been proposed that the conserved tt- 
containing amino acid uses stacking with the deoxyribose 
sugar through the R-group and hydrogen bonding with 
the 3^-OH through the backbone to select DNA deoxyri- 
bose nucleotide triphosphates (dNTPs) over RNA ribose 
nucleotide triphosphates (rNTPs) in a 1 000 000 (118) to 
100 ratio (119). Indeed, the conserved Tyr/Phe has been re- 
ferred to as a 'steric gate' since steric clashes may prevent 
incorporate of rNTP (enhance dNTP incorporation) (120). 
Nevertheless, the only support for this proposal comes from 
crystal structures (11 9, 121) or mutational studies (120,122- 
125) that replace Tyr/Phe by Gly/Ala/Val, which signifi- 
cantly reduces the size of the R-group and removes the tt- 
system. 

In the present work, the sugar-ir interactions in crys- 
tal structures with a nucleoside triphosophate bound in the 
active site were re-evaluated and determined to almost ex- 
clusively represent either Hia-H2b or Hia-H2b-H4 contacts 
with Tyr or Phe depending on the dNTP orientation. A rep- 
resentative example is the Hia-H2b sugar-ir interaction be- 
tween Tyr 12 and the incoming dCTP in the Dpo4 active 
site (a Y-family polymerase; PDB ID: 3QZ8; Figure 12C), 
which has a corresponding calculated binding energy of - 
15.6 kJ/mol (Tyr^w _i2.6kJ/mol Tyr^^^. Supple- 
mentary Figure SI for definition of Tyr orientations). This is 
a significant magnitude and indicates that the sugar-i? con- 
tact with Tyr 12 may be more than simply a steric constraint 
and, for example, may contribute to the selection of dNTP 
over rNTP. Indeed, modification of the sugar to the corre- 
sponding ribose analogue severely impacts this interaction 
in the polymerase active site, decreasing the closest heavy 
atom contact distance between the sugar and Tyr planes to 
2.126 A (3.397 A with deoxyribose present) and is repulsive 
by approximately 95 kJ mol~^ (with same hydroxyl orien- 
tation, which makes the sugar-Ti interaction highly repul- 
sive). Although the RNA sugar-ir interaction is repulsive 
compared to the stabilizing interaction with the DNA ana- 
logue in the Dpo4 example discussed above, this calcula- 
tion was performed on a structure obtained by replacing 
the sugar without geometry relaxation. Therefore, it is pos- 
sible that different relative monomer orientations in RNA- 
protein complexes allow sugar-ir contacts to be capitalized 
for cellular RNA processing. Nevertheless, this example il- 
lustrates the potential importance of DNA sugar-ir con- 
tacts in human biology. 

CONCLUSIONS 

In summary, our calculations yield important insight into 
the abundance and strength of over 500 DNA-protein in- 
teractions in nature. This in turn can be used to estimate the 
magnitude of similar contacts identified in lower resolution 
or newly released crystal structures. Most importantly, the 
present contribution suggests that nucleobase-amino acid 
contacts are wider spread than perhaps originally beheved 
and highlights the role of novel interactions between the 
deoxyribose moiety and the aromatic amino acids, which 
parallel the carbohydrate-ir contacts identified in glycobiol- 
ogy (62-68). Furthermore, we confirm for the first time that 
both broad classes of DNA-protein Tr-contacts are varied 



in structure and can provide significant stability to DNA- 
protein complexes. We therefore propose that the critical 
role of nucleobase-aromatic amino acids tt-tt interactions 
and deoxyribose-aromatic amino acid sugar-Ti contacts in 
many biological processes may yet to be uncovered. Indeed, 
examples can be found of both types of DNA-protein con- 
tacts in the active sites of enzymes crucial for human sur- 
vival. Understanding the DNA-protein tt -interactions in 
such systems may lead to advances in nanotechnology (69- 
74) and (anticancer (4,126,127) or antiviral (128-130)) drug 
development. 
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